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ABSTRACT 


We present a comprehensive analysis of the evolution of the morphological and structural properties 
of a large sample of galaxies at z = 3—9 using early JWST CEERS NIRCam observations. Our sample 
consists of 850 galaxies at z > 3 detected in both HST/WFC3 and CEERS JWST/NIRCam images, 
enabling a comparison of HST and JWST morphologies. We conducted a set of visual classifications, 
with each galaxy in the sample classified three times. We also measure quantitative morphologies across 
all NIRCam filters. We find that galaxies at z > 3 have a wide diversity of morphologies. Galaxies with 
disks make up 60% of galaxies at z = 3 and this fraction drops to ~30% at z = 6 — 9, while galaxies 
with spheroids make up ~ 30 — 40% across the redshift range and pure spheroids with no evidence for 
disks or irregular features make up ~ 20%. The fraction of galaxies with irregular features is roughly 
constant at all redshifts (~ 40 — 50%), while those that are purely irregular increases from ~ 12% to 
~ 20% at z > 4.5. We note that these are apparent fractions as many observational effects impact the 
visibility of morphological features at high redshift. On average, Spheroid Only galaxies have a higher 
Sérsic index, smaller size, and higher axis ratio than Disk or Irregular galaxies. Across all redshifts, 
smaller spheroid and disk galaxies tend to be rounder. Overall, these trends suggest that galaxies with 
established disks and spheroids exist across the full redshift range of this study and further work with 


large samples at higher redshift is needed to quantify when these features first formed. 


1. INTRODUCTION 


Between the era of early galaxy formation and today, 
galaxies have undergone dramatic transformations in all 
respects. They have produced successive generations of 
stars from clouds of molecular gas, continuously building 
up their stellar populations, while enriching the inter- 
stellar medium with heavy elements. The gas reservoir 
within galaxies changed as they converted a fraction of 
their supply of cold molecular gas into stars and fresh gas 
was replenished via inflow from the intergalactic medium 
(IGM). The overall star formation rate (SFR) density of 
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the universe grew until it reached a peak at z ~ 2— 3 
(Madau & Dickinson 2014) and then began to decline 
toward the present day low levels. The growth in the 
stellar mass of galaxies coincided with a change in their 
physical structure as the overall massive galaxy popu- 
lation transitioned from disk-dominated spiral galaxies 
into bulge-dominated elliptical galaxies. Throughout 
this assembly process, the central supermassive black 
holes (SMBHs) of galaxies grew, leading to an estab- 
lished relationship between SMBH and stellar mass (e.g., 
McConnell et al. 2012). Tracking the evolution of the 
structural properties of galaxies can provide key insights 
into the galaxy evolution pathways responsible for each 
of these transformations. Probing the different physical 
processes driving the formation of disks and bulges, the 
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growth of SMBHs, the onset of star formation, and its 
subsequent cessation during a critical time in the uni- 
verse's history is important for testing theoretical galaxy 
formation models. 

Deep extragalactic surveys with the Hubble Space 
Telescope (HST) have revolutionized our understanding 
of galaxy evolution between the peak epoch of galaxy as- 
sembly 10 Gyr ago and today, but many open questions 
remain about the early phases of evolution within the 
first 3 Gyr. When do we see the first disks in galaxies in 
the early universe? At what point did the first bulges 
form and do the physical processes responsible for their 
formation change with redshift? Does the quenching 
of star formation precede or follow the morphological 
transformation in these early galaxies? 

'To robustly address these questions, it is essential to 
push our observations into the early universe, since most 
of our current understanding of galaxies at high redshift 
has come from galaxies at z — 1 — 3, the period of time 
colloquially referred to as “cosmic noon." Even though 
this represents a time period 10 Gyr in the past, many 
galaxies at this time were already fairly mature and 
had structures, such as disks and bulges in star-forming 
galaxies, that generally resemble today's galaxies (e.g., 
Tacchella et al. 2015; Costantin et al. 2022a). Previ- 
ous large morphological studies of galaxies have typically 
been limited to galaxies at z « 3 due to the fact that cos- 
mological surface brightness dimming makes faint fea- 
tures in high redshift galaxies hard to detect and be- 
cause the rest-frame optical emission that traces the 
broad stellar populations in galaxies is shifted beyond 
the wavelength capabilities of HST at higher redshifts. 

Early morphological studies with WFPC2 and ACS 
on HST were ground-breaking, quantifying for the first 
time the fraction of galaxies of various Hubble types 
(i.e., barred and unbarred spirals, ellipticals, and irregu- 
lar galaxies) as a function of redshift, even beyond z ~ 1 
(e.g., Abraham et al. 1996; Giavalisco et al. 1996; Lowen- 
thal et al. 1997; Conselice et al. 2000; Jogee et al. 2004; 
Elmegreen et al. 2004; Sheth et al. 2008; Lotz et al. 2006; 
Ravindranath et al. 2006). However, at z > 1, these op- 
tical surveys probed the rest-frame UV light of galaxies 
and found that very large fractions of distant galaxies 
had peculiar or clumpy morphologies, which suggested 
at the time that the Hubble sequence had not yet formed 
at these early times (e.g., Abraham et al. 1996). Investi- 
gations using near-infrared observations with NICMOS, 
sensitive to the rest-frame optical structure of galaxies, 
found that galaxies beyond z ~ 1 presented a wide diver- 
sity of morphologies, including many objects that were 
compact or irregular, but also those that were morpho- 
logically mature spirals and ellipticals (e.g., van Dokkum 


& Franx 2001; Stanford et al. 2004; Papovich et al. 2005; 
Daddi et al. 2005; Elmegreen et al. 2005). 

With the installation of WFC3 on HST in 2009, large 
samples of fainter galaxies at cosmic noon were observed. 
CANDELS, the Cosmic Assembly Near-infrared Deep 
Extragalactic Legacy Survey, (Koekemoer et al. 2011; 
Grogin et al. 2011) obtained deep NIR imaging with 
WFC3 over a total of ~ 800 arcmin?. These observa- 
tions showed that while galaxies at z ~ 2 were overall 
messier and clumpier, with larger fractions of mergers 
and generally irregular galaxies than today's universe, 
the general underpinnings of the Hubble sequence were 
already in place, i.e., a large fraction of star-forming 
galaxies were overall disk-like and passive galaxies are 
overall compact or spheroid-like (e.g., Wuyts et al. 2011; 
Lee et al. 2013; Mortlock et al. 2013; van der Wel et al. 
2014; Kartaltepe et al. 2015; Zhang et al. 2019). This 
means that the first disks and spheroids must have be- 
gun to form at much earlier times. 

With its unprecedented sensitivity in the infrared, the 
James Webb Space Telescope (JWST) is poised to make 
remarkable discoveries about this transformative era in 
galaxy assembly and test key theoretical predictions of 
our understanding of the physics of the early universe. 
'The four pointings of deep multi-band NIRCam obser- 
vations taken in 2022 June from the Cosmic Evolution 
Early Release Science (CEERS) survey (Finkelstein et 
al., in prep.) provide the first opportunity for a compre- 
hensive analysis of the structural evolution of galaxies 
in the first 3 Gyr of the Universe's history. 

In this paper, we use these first CEERS observa- 
tions to conduct an early analysis of the evolution of 
galaxy morphology and structure for a large sample of 
HST/WFC3-selected galaxies at z = 3 — 9. This paper 
is organized as follows. In Section 2, we describe the 
basics of the CEERS observations and our data reduc- 
tion, along with the ancillary data used to identify our 
sample of galaxies at z > 3. In Section 3, we describe 
our morphological measurements, including visual clas- 
sifications, parametric, and non-parametric morpholo- 
gies. We present our results in Section 4 and discuss 
their implications in Section 5. Finally, we summarize 
our findings in Section 6. Throughout this paper, all 
magnitudes are expressed in the AB system and we as- 
sume a Chabrier (2003) Initial Mass Function (IMF). 
We also assume the following cosmological parameters: 
Ho = 70 km s^! Mpc^!, ot = 1, Qa = 0.7, and 
Om = 0.3. 


2. DATA 
2.1. CEERS Observations and Data Reduction 
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CEERS (Finkelstein et al., in prep.) is an Early Re- 
lease Science (ERS) program (Proposal ID #1345) that 
will observe the EGS (Extended Groth Strip, Davis et al. 
2007) extragalactic deep field (one of the five CANDELS 
fields; Koekemoer et al. 2011; Grogin et al. 2011) early 
in Cycle 1 with data made available to the public im- 
mediately. CEERS will obtain observations in several 
different modes with JWST, including a mosaic of 10 
pointings with NIRCam, NIRSpec multi-object spectro- 
scopic observations in parallel for six pointings, and six 
pointings with MIRI in parallel. The NIRCam imaging 
of CEERS will cover a total of 100 sq. arcmin with the 
F115W, F150W, F200W, F277W, F356W, F410M, and 
F444W filters down to a 50 depth ranging from 28.8— 
29.7 (for a typical total exposure time of 2835 s per 
filter). The first set of CEERS observations were taken 
on 2022 June 21 in four pointings, hereafter referred to 
as CEERS1, CEERS2, CEERS3, and CEERS6. 

We performed an initial reduction of the NIRCam im- 
ages in all four pointings using version 1.7.2 of the JWST 
Calibration Pipeline! with some custom modifications. 
We used the current set of NIRCam reference files?, 
though we note that the majority were created pre- 
flight, including the flats. For details on the reduction 
steps, see Bagley et al. (2022). To summarize briefly, 
we applied detector-level corrections using Stage 1 of 
the pipeline with default parameters and used custom 
scripts to remove 1/ f noise, wisps, and snowballs from 
the count-rate maps. We then processed the cleaned 
count-rate maps with Stage 2 of the pipeline and then 
used a custom version of the TweakReg step to calibrate 
the astrometry. We then coadded the images using the 
drizzle algorithm with an inverse variance map weight- 
ing (Casertano et al. 2000; Fruchter & Hook 2002) in 
the Resample step in the pipeline. The final mosaics for 
each pointing in all filters have pixel scales of 0703/pixel. 
We then used a custom script to background subtract 
the images. These final background-subtracted images 
were used for the morphology measurements described 
in Section 3. 


2.20. CANDELS Images and Catalogs 


For the analysis in this paper, before updated catalogs 
incorporating JWST photometry are available, we use 
existing CANDELS v2 redshifts and stellar masses for 
the HST F160W-selected galaxies in the EGS field. Here 
we provide a brief summary of these measurements; for 
full details on the photometric redshift measurements 
and resulting catalogs, see Kodra et al. (2022). 


1 jwst-pipeline.readthedocs.io 


2 jwst-crds.stsci.edu, jwst.0989.pmap, jwst.nircam.0232.imap 


'The v2 photometric redshifts and stellar masses are 
based on the CANDELS EGS photometric catalog of 
Stefanon et al. (2017), which includes broadband TFIT 
(Laidler et al. 2007) photometry from the UV to IR 
imaging from both ground- and space-based telescopes. 
For this paper, we adopt the z best column from the 
Kodra et al. (2022) catalog, which provides the overall 
best estimate of the redshift. This corresponds to the 
secure spectroscopic redshift if one is available or the 
mFDa4.z weight photometric redshift otherwise. We use 
this photometric redshift value because it produces the 
most accurate confidence intervals (see Kodra et al. 2022 
for further details). 

We then use the Stefanon et al. (2017) photometry 
and the above estimated redshift to determine stellar 
masses for galaxies in the CANDELS EGS field. This 
was done using two different codes: Dense Basis? and 
P12. Dense Basis (Iyer & Gawiser 2017; Iyer et al. 
2019) is a python-based code that uses Flexible Stellar 
Population Synthesis (FSPS) to generate model spectra 
that correspond to a wide range of physically motivated 
non-parametric star formation histories (SFHs), metal- 
licities, and dust attenuation values. P12 (Pacifici et al. 
2012) is a Fortran-based code that uses a Bayesian fit- 
ting algorithm and model SEDs generated using simple 
stellar population models and the reprocessing of these 
models using the photo-ionization code CLOUDY (Ferland 
et al. 1998). For more details on each of these two 
methods and detailed comparisons, see Iyer & Gawiser 
(2017), Iyer et al. (2019), Pacifici et al. (2012), and Paci- 
fici et al. (2022). 

We find that the stellar masses using these two meth- 
ods agree well with one-another, with the expected level 
of scatter (e.g., Pacifici et al. 2022). For this paper, we 
use the mean of the two measurements. We note here the 
caveats that these measurements are highly uncertain at 
the highest redshifts (z > 6) and the faintest magnitudes 
(F160W 26) because they rely on the HST ACS and 
WFC3 photometry that do not trace the rest-frame op- 
tical light at these redshifts. Future work will improve 
on these measurements with the addition of JWST NIR- 
Cam fluxes. 

In addition to the JWST CEERS images described 
above, we also use the existing CANDELS ACS and 
WFC3 images to compare the HST and JWST mor- 
phologies. Here we use the mosaics produced by the 
CEERS team^ with updated astrometry tied to Gaia- 
EDR3 and a pixel scale of 0/03/pixel. 


3 https://github.com/kartheikiyer/dense. basis 
4 https: //ceers.github.io/releases.html#hdr1 
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Figure 1. Stellar mass versus redshift for the z > 3 sample 
used in this paper. Above and to the right are the distribu- 
tions of redshift and stellar mass, respectively. The horizon- 
tal dashed line at log(M*/Mọo)=9 corresponds to the mass 
cut used for the subsample of objects in Section 4.1. 


2.3. Sample Selection 


For the sample analyzed in this paper, we select all 
galaxies with z best» 3 within the CEERSI, CEERS2, 
CEERS3, and CEERS6 NIRCam SW or LW footprints. 
We cross-match these sources with those identified in 
NIRCam imaging using the Source Extractor segmen- 
tation map (see 3.1 for more details). We remove any 
spurious sources in the NIRCam imaging, such as those 
that result from the diffraction spikes of stars and those 
that are so close to the edge of the images that we cannot 
obtain reliable morphology measurements. This results 
in a total sample of 850 sources with a detection in any of 
the NIRCam filters. We note here that at the magnitude 
limit of the CANDELS WFC3 images (F160W< 27.6, 
5c) almost all of these galaxies have S/N in the NIRCam 
images high enough to enable morphological classifica- 
tions, as discussed further in Section 3. 

Figure 1 shows the redshift and mass distribution for 
this sample of objects. We note that ten of these ob- 
jects have existing spectroscopic redshifts from either 
the MOSDEF (Kriek et al. 2015) or DEEP2 (Newman 
et al. 2013) spectroscopic surveys; we use these spec- 
troscopic redshifts for these ten objects. Overall, this 
sample peaks at z ~ 3 and has a long tail beyond z ^ 6 
out to z ~ 9. There are a total of 40 sources in this sam- 
ple with a photometric redshift of z > 6. We note here 
that the redshifts and stellar masses for these sources 
are uncertain, and some of these may turn out to be at 
lower redshift. The redshift and stellar mass estimates 
will be improved with the addition of JWST data to the 
SED modeling in the future. 


3. MEASUREMENTS 


3.1. Source Extractor Setup 


Galaxies were detected in the NIRCam images using 
Source Extractor? version 2.25.0 (Bertin & Arnouts 
1996). The setup was optimized to detect the HST- 
selected z > 3 galaxies without over-deblending. We 
created empirical PSFs for each filter by stacking stars 
and then the F115W, F150W, F200W, and F277W 
images were PSF-matched to the F356W image us- 
ing the Python-based code PyPHER® (Boucaud et al. 
2016). In order to detect galaxies that may be very 
faint in some of the NIRCam filters, we used an inverse- 
variance weighted combination of the PSF-matched 
F150W, F200W, F277W, and F356W images as the de- 
tection image. We first ran Source Extractor in a 
“cold” mode to deblend nearby galaxies, then in “hot” 
mode to detect faint galaxies, following Stefanon et al. 
(2017). We then combined the “cold” and “hot” detec- 
tions in order to keep all objects that were detected by 
at least one mode, using a factor of 2.5 to enlarge the 
cold isophotes. We visually inspected the segmentation 
map for the z > 3 sources and optimized the parameters 
to ensure that the sources were detected and adequately 
deblended from nearby neighbors without being shred- 
ded. We use the final segmentation map produced from 
this process for all of the measurements presented in this 
section. 


3.2. Visual Classifications 


Each of the 850 galaxies in our z > 3 sample were clas- 
sified by three different people from among a total of 35 
members of our team. We used the Zooniverse project 
builder’ to host images of each galaxy and designed a 
workflow of five tasks based on a modified version of the 
classification scheme of Kartaltepe et al. (2015). These 
five tasks ask classifiers to select options for the galaxy's 
main morphology class, their interaction class, various 
structural and quality flags, and finally to leave any spe- 
cific comments about a particular object. In this paper, 
we focus on the first of these five tasks, the main mor- 
phology classification, which roughly corresponds to the 
typical Hubble type classification. The options for the 
main morphology class are: 1) Disk, 2) Spheroid, 3), 
Irregular / Peculiar, 4) Point Source / Unresolved, and 
5) Unclassifiable / Junk. To reflect the overall diversity 
seen in high redshift galaxies, these classes are not mutu- 
ally exclusive, so a classifier can choose multiple options 
to best reflect the overall morphology of the galaxy. For 


5 https:/ /sextractor.readthedocs.io/ 
6 https://pypher.readthedocs.io/en/latest/ 


7 https:/ /www.zooniverse.org/lab 
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example, a galaxy can have both a disk and a spheroid 
in the case that it is a disk galaxy with a bulge com- 
ponent. A galaxy can be both a disk and irregular, if 
for example it is an asymmetric disk or a disk involved 
in an interaction. The exception is that if a galaxy is 
‘Unclassifiable’ then it cannot also be one of the other 
classes. This level of complexity can make the interpre- 
tation of the various classes challenging, but the extra 
information provides an important level of nuance to the 
classifications. 

Classifiers are presented with a collection of postage 
stamps for each galaxy being classified. These stamps 
include each of the JWST NIRCam filters (F115W, 
F150W, F200W, F277W, F356W, F410M, and F444W) 
at their native resolution, with an asinh scaling to bring 
out low surface brightness features an RGB stamp made 
up of the filters that correspond to the rest-frame op- 
tical, a version of that stamp zoomed out by a fac- 
tor of two, the NIRCam Source Extractor segmen- 
tation map described above, three HST ACS/WFC3 
filters (F814W, F125W, and F160W, also with an as- 
inh scaling), and finally an RGB stamp of these three 
HST filters and a similarly zoomed out version. The 
stamps are scaled by the size of the galaxy as measured 
by Source Extractor, following Equations 2 and 3 of 
Haussler et al. (2007), with a minimum size of 100x100 
pixels. An example set of stamps for one of the galaxies 
is shown in the Appendix. 

'The classifiers are asked to make a holistic judgement 
about the overall morphology of the galaxy, taking in- 
formation across the full wavelength range into account. 
In a separate task, the classifiers can select flags to in- 
dicate that the morphology changes across the NIRCam 
filters or differs between JWST and HST images. 


3.3. Parametric Fits 


We perform parametric fits on the NIRCam images 
using both Galfit ? (Peng et al. 2002; Peng et al. 2010) 
and GalfitM ? (Haufler et al. 2013). Galfit is a least- 
squares fitting algorithm that finds the optimum Sérsic 
fit to a galaxy's light profile and GalfitM is a modi- 
fied version that uses images at different wavelengths 
to allow one to constrain the fit parameters that vary 
smoothly as a function of wavelength. The benefit of us- 
ing GalfitM is that it fits all bands simultaneously and 
produces more physically consistent models. We per- 
formed fits using both codes to test for self-consistency, 
but since we use the rest-frame optical fit throughout 


8 https://users.obs.carnegiescience.edu/peng/work/galfit/galfit.html 


9 https:/ /www.nottingham.ac.uk/astronomy /megamorph/ 


this paper, we focus here on the GalfitM fits and de- 
scribe the Galfit fits in the Appendix. 

GalfitM fits were performed using the IDL program 
Galapagos-2 from the MegaMorph Project!?(Bamford 
et al. 2011; Vika et al. 2013; HauBler et al. 2013, 2022). 
Galapagos-2 is a wrapper that enables GalfitM to be 
run over larger survey images. We used the Source 
Extractor setup described above with Galapagos-2. 
As input, we used all seven NIRCam filters (F115W, 
F150W, F200W, F277W, F356W, F410M, and F444W) 
and used the NIRCam WHT images produced by the 
JWST pipeline to create RMS images to be used as 
the input sigma image. We used the F200W Source 
Extractor catalog for initial guesses and used the NIR- 
Cam empirical PSFs. In addition to the final output 
catalog with the Sérsic fit parameters, Galapagos-2 also 
outputs the original stamp, the GalfitM model, and the 
residual image for each galaxy in each filter. Out of the 
850 z > 3 galaxies in our sample, 37 (4%) were flagged 
because GalfitM reached one of the constraint limits in 
one of the filters. 


3.4. Non-parametric Measurements 


We measure non-parametric morphologies using the 
Python package Statmorph!! (Rodriguez-Gomez et al. 
2019). For each NIRCam filter, we create 100x 100 pixel 
cutouts of the 850 galaxies in our z > 3 sample to use 
as input to Statmorph, along with a cutout of the seg- 
mentation map, the empirical PSF, and the gain. 

Statmorph measures a wide range of morphology 
statistics commonly used in astrophysics. The ones that 
we use for the analysis in this paper are: concentration 
(C), asymmetry (A), and clumpiness/smoothness (S) 
(Bershady et al. 2000; Conselice et al. 2000; Conselice 
2003); the Gini coefficient (G) and the second moment 
of the region of the galaxy containing 2096 of the to- 
tal flux (M29) (Abraham et al. 2003; Lotz et al. 2004); 
the Gini-M20 bulge, and merger statistics (Rodriguez- 
Gomez et al. 2019), the signal-to-noise per pixel, and 
quality flags. 

As for the parametric fits, we use the fits for the filter 
corresponding to the rest-frame optical emission at the 
redshift of the galaxy: F277W for galaxies at z — 3.0 — 
4.0, F356W for galaxies at z — 4.0 — 4.5, and F444W for 
galaxies at z > 4.5. Of the 850 galaxies fit, 81% have 
a signal-to-noise per pixel of > 2.5 in the corresponding 
rest-frame optical filter; below this value, the fit results 
may not be reliable (Lotz et al. 2006). We compare these 


10 https: //www.nottingham.ac.uk/astronomy / megamorph/ 
11 https: //statmorph.readthedocs.io/en/latest / 
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Figure 2. NIRCam F150W+F277W+F356W postage 
stamp cutouts of a selection of example galaxies in each of 
the seven morphological groups described in Section4.1 at 
three different redshift bins. Each cutout is 2" on a side. 


commonly used measures of galaxy morphology to our 
visual classifications in Section 4.3. 


4. RESULTS 
4.1. Visual Classifications 


For each of the 850 galaxies in our z > 3 sample, we 
assign a galaxy a given visual classification if two out of 
three people select a given option as the main morpho- 
logical class. There is only one object in our sample for 
which all three classifiers disagree, meaning one selected 
only ‘disk’, one selected only ‘spheroid’, and one selected 
only ‘irregular. This object is therefore not included 
in any of the figures presented here. As noted above, 


since the main morphological classes are not mutually 
exclusive, various combinations are possible. Through- 
out this paper, we break things down into the following 
non-exclusive morphological groups (highlighted in Fig- 
ure 2): 


e Galaxies with Disks: The Disk category con- 
tains galaxies classified as Disk Only (without a 
spheroid or irregular component), Disk--Spheroid 
(a galaxy with both a disk and spheroid compo- 
nent; a separate structural flag indicates whether 
the disk or the bulge is dominant), Disk--Irregular 
(a disk galaxy with irregularities such as asymme- 
tries, a warp, or disturbance by a nearby com- 
panion), Disk+Spheroid+Irregular (a disk galaxy 
with a spheroid component that also has some ir- 
regularities; note that these are fairly rare). When 
we refer to ‘All Disks’ we are referring to the sum 
of the galaxies in all of these categories. 


Galaxies with Spheroids: The Spheroid cat- 
egory contains galaxies classified as Spheroid 
Only (without a disk or irregular compo- 
nent), Spheroid+Disk (same as Disk+Spheroid 
above), Spheroid+Irregular (a spheroid galaxy 
with irregularities such as asymmetries or 
surrounding low surface brightness  fea- 
tures), Spheroid+Disk+Irregular (same as 
Disk+Spheroid+Irregular above). When we re- 
fer to ‘All Spheroids’ we are referring to the sum 
of the galaxies in all of these categories. 


Galaxies with Irregular Features: The Irregular 
category contains galaxies classified as Irregular 
Only (no discernible disk or spheroid component), 
Irregular+Disk (same as Disk+Irregular above), 
Irregular+Spheroid (same as Spheroid+Irregular 
above), Irregular+Disk+Spheroid (same as 
Disk+Spheroid+Irregular above). When we re- 
fer to ‘All Irregulars’ we are referring to the sum 
of the galaxies in all of these categories. Note 
that the Irregular category may include merging 
or interacting systems, but also galaxies that are 
irregular for other reasons, such as clumpy star 
formation. Mergers and Interactions themselves 
will be discussed in a future paper (Rose et al., in 
prep). 


Our sample contains the full range of morphological 
types across all redshift and stellar masses. Over the 
entire redshift range, only 16 and 18 galaxies are clas- 
sified as Point Source/Unresolved or Unclassifiable, re- 
spectively. Figure 3 shows the fraction of the total num- 
ber of galaxies that each morphological class makes up 
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Figure 3. The fraction of z > 3 galaxies detected by both JWST and HST with M, > 10° Mo as a function of redshift for each 
morphology class. The top row, from left to right shows galaxies with disks, galaxies with spheroids, galaxies with irregular 
features, and Point Sources and Unclassifiable galaxies. The bottom row shows all of the same morphological groups, but 
divided in different ways for easy comparison. From left to right, the combination of all disks, all spheroids, and all irregulars; 
the combination of Disk Only, Spheroid Only, and Irregular Only groups; and finally, the remaining mixed groups. Error bars 
represent the lo binomial confidence limits given the number of objects in each category, following the method of Cameron 


(2011). 


as a function of redshift. For a fair comparison across 
redshifts, we limit this to the 666 galaxies with stellar 
masses greater than 10? Mo since the galaxies with lower 
stellar masses are only present at the low redshift end 
of our sample (see Fig. 1). We emphasize that this rep- 
resents a mass-selected sample but not a mass-complete 
sample, as there are likely to be many galaxies identi- 
fied by JWST in this mass range that are undetected by 
CANDELS HST imaging. 

Overall, 56% of the galaxies above this mass cut at 
z > 3 have a visually identifiable disk component, drop- 
ping from ~ 60% at z = 3 — 4, to ~ 45% at z ~ 5, 
to ~ 30% at z > 6. Again, we note that the photo- 
metric redshifts at z > 6 are uncertain, so some frac- 
tion of these sources may actually be at lower redshift; 
we caution the reader when interpreting the results at 
z > 6. The Disk Only and the Disk+Irregular groups 
each make up ~ 20% (and slightly less at z > 6) while 
the Disk+Spheroid group makes up ~ 10% and the 
Disk+Spheroid+Irregular group makes up < 5%. 38% 
of the galaxies at z > 3 have a visually identifiable 
spheroidal component, decreasing from 42% to 26% be- 
tween z = 3 and 4.5, then varying between ~30-40% 
beyond z = 4.5. This is largely driven by the similar de- 
crease then increase in the Spheroid Only group. Part 
of this apparent trend at higher redshifts may be due 
to small number statistics and part may be due to a 
number of selection effects. For example, there is a pos- 


sibility that we miss fainter extended features in some 
of these systems at high redshift. It is also possible that 
a larger fraction of galaxies at higher redshift are small 
enough to be at the resolution limit of NIRCam, given 
the expected size evolution of galaxies, and are therefore 
more round and compact in appearance. 

43% of the galaxies at z > 3 have irregular fea- 
tures and this fraction remains roughly constant across 
the full redshift range due in part to the fraction of 
Disk+Irregular galaxies being roughly constant at 20% 
and then decreasing while the fraction of Irregular Only 
galaxies is at roughly 10 — 15% and then increases to 
20% by z = 4.5. Note that the total fractions of objects 
that are All Disks, All Spheroids, or All Irregular do not 
add up to one due to the overlapping objects in each of 
these classes. 

Finally, we note that the fraction of point sources and 
unclassifiable objects remains at 0 — 2% across most of 
the redshift range. At z > 6, 13% of galaxies are un- 
resolved and 8% are unclassifiable, corresponding to 5 
and 3 individual galaxies, respectively, in this redshift 
bin. We remind the reader that the above percentages 
correspond to galaxies that were bright enough to be 
detectable with HST CANDELS imaging and may not 
be representative of the overall galaxy sample detectable 
by JWST at these redshifts. 


4.2. Comparison with Surface Brightness Profile Fits 
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Figure 4. Stacked histograms illustrating the distributions of the rest-frame optical Sérsic Index (n), effective radius (Re), and 
axis ratio (b/a) for the z > 3 galaxy sample. The colors indicate the different combinations of the main morphological class 
chosen by two out of three people during the visual classifications, as described in Section 4.1 and Figure 2. 


One of the major advances that JWST NIRCam imag- 
ing brings to morphological analyses of galaxies is that 
the broad wavelength coverage enables us to probe the 
rest-frame optical morphologies of galaxies across a wide 
redshift range. As described in Section 3.3, we used 
GalfitM to perform multiwavelength parametric fits 
across all of the NIRCam filters. For a fair comparison 
of these parameters at different wavelengths, we select 
the NIRCam filter closest to the rest-frame optical at 
different redshifts. Throughout this section, we use the 
F277W filter for galaxies at 3.0 < z « 4.0, F356W for 
galaxies at 4.0 « z « 4.5, and F444W for galaxies at 
z > 4.5. We compare these measurements to the visual 
classifications in Figures 4 and 5. 

Overall, the distribution of Sérsic indices (where n — 
0.5 corresponds to a Gaussian profile, n — 1 to an expo- 
nential profile, and n — 4 to a de Vaucouleurs profile) 
tracks with the expectations from the visual classifica- 
tions. Disk galaxies with no apparent spheroid or ir- 
regular features (Disk Only) peak at low Sérsic indices 
with a long tail out to higher values, as expected ((n) — 
LIB where the error bar denotes the 16th-84th per- 
centile range of the distribution). Galaxies that are pure 
spheroids (Spheroid Only) have a much broader distri- 
bution and peak at higher n ((n) = 2.46*? 22) as has 


been noted at lower redshift based on HST imaging (e.g., 
Vika et al. 2015; Kartaltepe et al. 2015). Galaxies with 
both a disk and spheroidal component (Disk+Spheroids) 
peak at intermediate values ((n) = 2.39* 10$). This il- 
lustrates that a cut at a fixed Sérsic index would not 
cleanly select disk or spheroidal dominated galaxies. For 
example, a dividing line of n — 2 would identify 7196 of 
the visually identified disks and only 4596 of the visu- 
ally identified spheroids. However, it is worth noting 
that that a fraction of the objects visually identified as 
spheroids with low n might have extended low surface 
brightness disks that are difficult to pick out by eye. 
Irregular galaxies with no disk or spheroid compo- 
nent peak at very low n, with a substantial frac- 
tion at n « 1 and a long tail out to higher values 
((n) = 1.19* 531). Disk galaxies with irregular features 
(Disk--Irregular) peak closer to n = 1 with a narrower 
distribution that more closely resembles that of the 
Disk Only galaxies. Likewise, the distribution of Sérsic 
indices for the spheroidal galaxies with irregular fea- 
tures (Spheroid--Irregular) closely resembles that of the 
Spheroid Only group. A visual inspection of the models 
and residuals for the irregular galaxy population reveals 
that, unsurprisingly, irregular features are not well-fit by 
a Sérsic profile. For disks and spheroidal galaxies with 
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Figure 5. The rest-frame optical Sérsic index (n) plotted as a function of the effective radius (Re) in kpc (top left) and the 
axis ratio (b/a) (top right) for the z > 3 galaxy sample. The axis ratio plotted as a function of the effective radius is shown on 
the bottom left. The colors of each point indicate the different combinations of the main morphological class chosen by two out 
of three people during the visual classifications, as described in Section 4.1 and Figure 2. The median values for each group are 
shown as stars with the error bars representing the 16th-84th percentile range of the distribution. 


irregular features, the model fits the disk/spheroidal 
component well and leaves behind features in the resid- 
uals, while the irregular only population are not well-fit 
at all. We caution the reader against over-interpreting 
Sérsic indices for irregular galaxies and against using 
Sérsic indices to select disk galaxies without first check- 
ing the images (and residuals) for irregular features. 
'The center panel of Figure 4 shows the distribution of 
sizes (effective radii, Re) measured by GalfitM for each 
of the morphological types. Galaxies with disks and ir- 
regular features generally have larger sizes than those 
with spheroids. For example, galaxies with disks only 
have a median effective radius of kig 0 kpc, irregu- 
lar only have (Re) = 132 kpc, while spheroid only 
galaxies have (Re) = 0.547935 kpc. These trends are 
seen more clearly in Figure 5. The Disk--Irregular and 
Disk+Spheroid+Irregular groups have size distributions 
that more closely match the distribution for the Irregu- 
lar Only group, while the Spheroid+Irregular group has 


a smaller median size ((R.) = 0.74103; kpc). 


The distribution of the axis ratios is shown in the right 
panel of Figure 4 and in 5 and offers another way to com- 
pare our visual morphologies to a quantitative measure- 
ment. A population of disks with exponential profiles 
and random orientations is expected to have a relatively 
flat distribution of axis ratios that falls off at low values, 
while triaxial ellipsoids are expected to have a distribu- 
tion that is peaked at higher values, b/a ~ 0.6, (e.g., 
Elmegreen et al. 2005; Ravindranath et al. 2006; Padilla 
& Strauss 2008; Law et al. 2012; Robertson et al. 2022). 
The mean values we see for the different morphologi- 
cal groups follows this trend. The Spheroid Only group 


has the largest median axis ratio (0.64* 012), while the 


Disk--Irregular group has the smallest (0.36*0 71). 
Figure 6 shows the axis ratio as a function of effective 
radius split into several redshift bins. In each redshift 
bin from z — 3 to z — 6, the Spheroid Only galaxies 
have the smallest median effective radius and the largest 
median axis ratios, suggestive of a population of true 
triaxial ellipsoids. Overall, we see the general trend of 


small galaxies being rounder, for all morphology types, 
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Figure 6. The rest-frame optical axis ratio (b/a) plotted as a function of the effective radius (Re) in kpc in six different redshift 
bins. The colors indicate the different combinations of the main morphological class chosen by two out of three people during 
the visual classifications, as described in Section 4.1 and Figure 2. The individual sources are shown as transparent circles and 
the median for each group is shown as a star with the 16th-84th percentile range of the distribution shown as error bars. 


as seen by Padilla & Strauss (2008) and Zhang et al. 
(2019), in each of these redshift bins. We cannot draw 
any conclusions at z > 6 due to the small sample size 
and the previously mentioned uncertainties. 


4.3. Comparison with Non-parametric Measures 


As described in Section 3.4, we used statmorph to 
measure non-parametric image statistics for all of the 
HST-selected z > 3 galaxies in our sample across all 
NIRCam filters. We use the same filters correspond- 
ing to the rest-frame optical emission for each galaxy 
as we did for the above parametric comparison. In to- 
tal, 8196 of the galaxies in our sample have a reliable fit 
from statmorph; Figures 7 and 8 highlight two of the 
commonly used methods to separate galaxies into the 
standard Hubble types and identify mergers based on 
these image statistics (e.g., Bershady et al. 2000; Con- 
selice 2003; Lotz et al. 2004, 2008b). 

'The top panel of Figure 7 shows the location of each 
galaxy on the asymmetry-concentration plane, with the 
classic lines used to mark the boundaries between disk 
galaxies, elliptical galaxies, intermediate galaxies, and 


mergers for moderate redshift HST images (Bershady 
et al. 2000). While these boundaries do not cleanly sep- 
arate z > 3 galaxies into different types relative to their 
visual classifications, a few trends can be seen. On aver- 
age, galaxies with a spheroid have a higher concentration 
(C) than those with disks and irregulars. Similarly, ir- 
regular galaxies have a higher asymmetry value (A), on 
average. Very few galaxies lie above the classic demar- 
cation for mergers (Conselice 2003), and those that do 
span the full range of visual morphologies, albeit with 
a higher fraction of irregulars. Figure 8 shows the dis- 
tribution of the same galaxies on this plane, but color 
coded by the median Sérsic index for each bin. This dis- 
tribution highlights the correlation of the concentration 
value with the Sérsic index for the sample. 

'The bottom panel of Figure 7 shows the location of 
each galaxy on the Gini-M20 plane. The lines mark 
the boundary between disk galaxies, ellipticals/S0s, and 
mergers based on nearby galaxies and then adjusted for 
galaxies at z ~ 1 (Lotz et al. 2004, 2008b). While 
there is no discernible difference between z > 3 disks 
and spheroids with this diagnostic (similar to what has 
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Figure 7. Top: The rest-frame optical asymmetry value as a function of concentration for all galaxies at z > 3 split by all 
disk galaxies (left), spheroids (center), and irregulars (right). The dash-dotted lines show the boundaries between disk galaxies, 
elliptical galaxies, and intermediate galaxies from Bershady et al. (2000) and the dashed line is the dividing line above which 
nearby galaxies are expected to be major mergers (A = 0.35; Conselice 2003). Bottom: The rest-frame optical Gini value as 
a function of Myo for all galaxies at z > 3 split by all disk galaxies (left), spheroids (center), and irregulars (right). The lines 
show the boundaries between disk and elliptical galaxies (dash-dotted), and mergers (dashed) from Lotz et al. (2008b). The 
colors indicate the different combinations of the main morphological class chosen by two out of three people during the visual 
classifications, as described in Section 4.1 and Figure 2. 
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been seen at lower redshift and in simulations, e.g., Lotz 
et al. 2008a; Kartaltepe et al. 2010; Pearson et al. 2019), 
irregular galaxies have higher Gini and Mag values, on 
average. 2896 of galaxies with irregular features lie above 
the merger line, whereas only 1796 of disks and 1596 of 
spheroids do. The right-hand panel of Figure 8 shows 
the distribution of the same galaxies on this plane, but 
color coded by the median Sérsic index for each bin. 
Galaxies that occupy the ellipticals/S0 portion of this 
plane have a higher Sérsic index, on average. A signifi- 
cant fraction of the galaxies in the merger region of the 
plane also have a higher average Sérsic index. 


5. DISCUSSION 
5.1. Disks and Spheroids in the Early Universe 


We find that galaxies detected by both HST and 
JWST in the z > 3 universe have a wide diversity of 
morphologies. Overall, ~60% of these galaxies have 
disks (including those with spheroids and/or irregular 
features as well) at z = 3 — 4, and this fraction has 
an apparent downward trend with increasing redshift. 
Other early JWST studies have identified candidate disk 
galaxies at these redshifts (e.g., Ferreira et al. 2022a,b; 
Robertson et al. 2022) and find similar fractions. Galax- 
ies with spheroids make up ~40% over this redshift 
range, with some variations in the higher redshift bins 
that are likely related to the the small numbers in these 
bins overall or the difficulty in identifying low surface 
brightness features at these redshifts. Galaxies with a 
pure spheroid, i.e., without discernible disks or irregular 
features, make up ~ 20% across the full redshift range, 
roughly consistent with the findings of Ferreira et al. 
(2022a). While the fraction of all galaxies with irregular 
features is roughly constant at all redshifts (~40-50%), 
the fraction of galaxies that are purely irregular (i.e., 
those that have no discernible disk or spheroidal fea- 
tures) increases from ~ 12% at z = 3 — 3.5 to ~ 20% 
at z > 4.5 This fraction is lower than that reported by 
Ferreira et al. (2022a,b). Slight differences among these 
early studies likely arise due to the different classifica- 
tion schemes and mass ranges used. 

The distribution of axis ratios and sizes presented 
in Section 4.2 and Figures 5 and 6 suggest that our 
z > 3 sample indeed contains a mix of true disks 
and spheroids. The distribution of axis ratios for the 
galaxies classified as Spheroid Only is consistent with 
that expected from a population of triaxial ellipsoids, 
while the relatively flat distribution and lower median 
for the Disk galaxies is expected for a population of 
disks with exponential profiles and random orientations 
(e.g., Padilla & Strauss 2008). The axis ratio distribu- 
tion for the Spheroid Only group peaks at b/a > 0.6 


across the entire redshift range of our sample, while 
the median for the Disk Only group remains at b/a ~ 
0.4. The Disk+Spheroid and Spheroid+Irregular pop- 
ulations have axis ratio and size distributions that are 
intermediate. Previous theoretical work has found that 
galaxy shapes have evolved over time from prolate to 
oblate as they transition from having dark matter dom- 
inated interiors to baryonic matter interiors following a 
compaction event (e.g., Ceverino et al. 2015; Tomassetti 
et al. 2016) and that this transition happens earlier for 
more massive galaxies (e.g., Zhang et al. 2019). 

It is worth noting that selection effects may be par- 
tially responsible for the axis ratio distributions ob- 
served for these objects. For example, it has been shown 
that the distribution of axis ratios has a strong depen- 
dence on the mass and luminosity of the galaxy popu- 
lation (e.g., Padilla & Strauss 2008; Zhang et al. 2019). 
Galaxy orientation also plays an important role in this 
distribution, as face-on disks may be more difficult to 
detect than edge-on disks at the magnitude limit (e.g., 
Elmegreen et al. 2005) and the presence of dust can im- 
pact the measured axis ratios (Padilla & Strauss 2008). 
The size of the current sample does not allow binning by 
mass, luminosity, or finer morphology groupings, how- 
ever, future work with larger sample sizes will allow 
greater exploration of this parameter space. 

To summarize, we see evidence for galaxies with estab- 
lished disks and spheroidal morphologies across the full 
redshift range of our sample. We emphasize that the 
fractions quoted here are apparent fractions only and 
that several observational effects likely play a role in 
these measurements (as discussed in Section 5.2). Fur- 
ther work is needed to quantify our ability to pick out 
disk features and resolve spheroidal galaxies in JWST 
images at varying image depths in order to quantify 
the true fraction of galaxies with disks and spheroids 
at these redshifts. Likewise, larger samples, particularly 
at z > 6, will be needed to truly establish when the first 
disks began to form, when disks grew their bulges, and 
when spheroids emerged. 


5.2. Comparison between JWST and HST 
Morphologies 


Figure 9 shows the morphological fractions as a func- 
tion of redshift based on CANDELS HST imaging and 
using the visual classifications of Kartaltepe et al. (2015) 
for all z > 3 galaxies in all five CANDELS fields (1375 
galaxies in total). The HST classifications were limited 
to galaxies with F150W< 24.5, because fainter galaxies 
could not be reliably classified, and so only 59 galax- 
ies out of the 850 in our sample are bright enough 
to make that cut. A comparison of the JWST mor- 
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Figure 9. The fraction of z > 3 galaxies detected HST with M, > 10? Mo as a function of redshift for each morphology class 
based on the CANDELS HST visual classifications of Kartaltepe et al. (2015). The top row, from left to right shows galaxies 
with disks, galaxies with spheroids, galaxies with irregular features, and Point Sources and Unclassifiable galaxies. The bottom 
row shows all of the same morphological groups, but divided in different ways for easy comparison. From left to right, the 
combination of all disks, all spheroids, and all irregulars; the combination of Disk Only, Spheroid Only, and Irregular Only 
groups; and finally, the remaining mixed groups. Error bars represent the 1c binomial confidence limits given the number of 
objects in each category, following the method of Cameron (2011). 
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Figure 10. The fraction of z > 3 galaxies detected HST with M, > 10? Mo as a function of redshift for each morphology 
class based on the CEERS JWST visual classifications (as in Fig. 3) compared to the CANDELS HST visual classifications of 
Kartaltepe et al. (2015) (as in Fig. 9) for the 59 objects from EGS that were bright enough to be classified. From left to right, 
the combination of all disks, all spheroids, and all irregulars; the combination of Disk Only, Spheroid Only, and Irregular Only 
groups; and finally, objects that were unclassifiable or point sources. Error bars represent the 10 binomial confidence limits 
given the number of objects in each category, following the method of Cameron (2011). 


phological fractions with those 59 specific galaxies are 
shown in Figure 10. Based on the HST imaging alone, 
a smaller fraction of galaxies at z — 3.0 — 4.5 have disks 
(~40%) and a larger fraction are pure spheroids (72096 
at z — 3.0 — 5.0). The fraction of galaxies that are only 
irregular is small and drops with redshift, from ~5% 
at z > 6. The fraction of galaxies that are unclassifi- 
able rises sharply, ~5% at z = 3.5 to ~35% at z = 5.5 
to ~80% at z > T. Likewise, ~30% are unresolved at 
z —5—'T. Among the 59 galaxies with both HST and 
JWST classifications, a higher fraction is classified as a 


spheroid and a lower fraction is classified as a disk or 
irregular with HST than with JWST. At z > 6, all of 
the objects were unclassifiable or unresolved with HST. 

The large difference seen between the HST and JWST 
morphologies at these redshifts is expected and is due 
to the difference in depth and wavelength coverage. 488 
galaxies were flagged by at least one classifier as having a 
different morphology in the JWST images compared to 
the HST images (159 galaxies were flagged by two out of 
three classifiers, see Fig. 11 for examples). A significant 
number of galaxies with disks were previously identified 
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as spheroids because of their compact central morpholo- 
gies, with low surface brightness disk features that only 
became visible with deeper imaging (see, for example, 
Conselice et al. 2011; Mortlock et al. 2013; Kartaltepe 
et al. 2015). This suggests that some fraction of the 
spheroidal galaxies observed with JWST, particularly 
those that are faint and/or at higher redshifts, possibly 
have unobserved disks as well. It is not likely that these 
disks would previously have been identified as irregu- 
lar, except for some at the low redshift end, as these 
irregular features are also too faint to be easily iden- 
tified at these redshifts with HST. At the low redshift 
end (z = 3 — 3.5), some disks may have been classified 
as irregular if the the HST data only only picked up the 
brighter star forming clumps rather than the underlying 
disk structure. 

'To explore the impact of observed wavelength on the 
classifications, classifiers were also asked to flag objects 
for which their morphology changes (i.e., they would 
have selected different main morphology classes) be- 
tween the NIRCam short wavelength filters and the long 
wavelength filters. At least one classifier chose this flag 
for 190 galaxies and two out of three chose it for 37 (see 
Fig. 11 for examples). Note that some of the difference 
seen across the different filters may be due in part to the 
increased resolution of the short wavelength bands. This 
flag was rarely chosen, suggesting that the depth of the 
JWST images is the primary driver in the morphological 
differences observed between JWST and HST. 


5.3. Comparison with Expectations from Theory 


We compare the results of our surface brightness pro- 
file fits and non-parametric fits with the results from 
mock images and catalogs based on several different sim- 
ulations in Figure 12 in three different redshift bins. 

First, we use a mock galaxy catalog based on the Santa 
Cruz Semi-analytic model (SAM) and publicly available 
as part of the CEERS simulated data release SDR3'?. 
The CEERS mock galaxy catalog is an augmented ver- 
sion of the EGS lightcone presented by Yung et al. 
(2022), which spans 782 arcmin? between 0 « z X 10 
and contains galaxies —16 2 Myy 2 —22. The physical 
properties of the galaxies are modeled with the physics- 
based Santa Cruz SAM (Somerville et al. 2015, 2021; 
Yung et al. 2019). The sizes of the disk components of 
galaxies are computed based on the ansatz that the spe- 
cific angular momentum of the halo gas is equal to that 
of the dark matter halo, and that it is conserved during 
disk formation (Mo et al. 1998; Somerville et al. 2008). 


12 https://ceers.github.io/sdr3.html#catalogs 


HST JWST JWST JWST 
F160W F150W F277W F356W 


Figure 11. HST and JWST postage stamps for five example 
galaxies with different morphologies in HST F160W images 
compared to JWST images or differences across the JWST 
filters. The F150W, F277W, and F356W filters are shown 
along with an RGB combination of these three filters. Each 
stamp is 2" on a side. 


We also use the publicly available!? mock images 
and derived morphological catalogs of Costantin et al. 
(2022b) and Rose et al. (2022), which use the Illus- 
trisTNG cosmological simulation'*. The IlustrisTNG 
project (Springel et al. 2018; Naiman et al. 2018; Nelson 
et al. 2018; Pillepich et al. 2018; Marinacci et al. 2018) 
is a series of large cosmological magnetohydrodynam- 
ical simulations of galaxy formation and is an update 
to the original Illustris-1 simulation (Vogelsberger et al. 
2014). It consists of three different runs that span a 
range of cosmological volumes and resolutions: TNG50, 
TNG100, and TNG300. 

Costantin et al. (2022b) produced synthetic images of 
TNG50 galaxies and used MIRaGe!? (Multi Instrument 
Ramp Generator) to simulate raw NIRCam images for 
the CEERS depth and filter combination. These im- 
ages were then reduced using the JWST pipeline. Mor- 


13 https://ceers.github.io/ancillary data.html 
14 https:/ /www.tng-project.org/ 
15 https://github.com/spacetelescope/mirage 
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Figure 12. The distribution of Sérsic index, size, axis ratio, and asymmetry of the z > 3 CEERS galaxy in three different 
redshift bins compared to the distribution from the CEERS mock catalog derived from the Santa Cruz Semi-analytic model 
(blue dotted line; Somerville et al. 2015, 2021; Yung et al. 2019; Yung et al. 2022), measurements from mock images based 
on IllustrisTNG50 (red dashed line; Costantin et al. 2022b), and measurements from mock images based on IllustrisTNG100 


(orange dash-dotted line; Rose et al. 2022). 


phological measurements were made using statmorph. 
Rose et al. (2022) produced noiseless synthetic images 
with TNG100 galaxies using the public visualization 
API (Nelson et al. 2019) in each of the CEERS fil- 
ters. These images were then convolved with the model 
PSF for each filter using WebbPSF (Perrin et al. 2014). 
Poisson noise and background noise estimated from the 
JWST exposure time calculator (Pontoppidan et al. 
2016) were added to create mock images at the CEERS 
depth. Parametric models were fit using Galapagos-2 
and GalfitM while non-parametric fits were performed 
using statmorph. 

Figure 12 compares the distribution of the Sérsic in- 
dices and sizes of galaxies from the SAM, TNG50, and 
TNG100 to the distribution measured from CEERS 
galaxies (Section 4.2 and Figure 4). The overall dis- 
tributions from the SAM have very similar peaks with a 
narrower distribution, which holds for all three redshift 
bins. The Sérsic index for both TNG50 and TNG100 
peak at lower values than the CEERS galaxies and have 
narrower distributions at all redshifts. At z — 3 — 4, 
TNG50 galaxies have larger sizes than TNG100 galax- 
ies and even larger than both the SAM galaxies and the 
observed CEERS galaxies. At z > 4 the distributions 
match more closely. At all redshifts, the simulations do 
not contain the smaller (lower Re) more compact (larger 


n) galaxies that we observe with JWST CEERS imag- 
ing. 

Figure 12 also compares the measured axis ratio and 
asymmetry value for the TNG50 and TNG100 galax- 
ies to the distribution from CEERS. In all three red- 
shift bins, the axis ratios of the TNG50 and TNG100 
galaxies match each other well, but peak at higher b/a 
(~0.6) and fall off more sharply at lower values than 
the observed CEERS galaxies. At z — 3 — 4, the asym- 
metry distributions for TNG50, TNG100, and CEERS 
are well-matched, but the TNG50 and TNG100 distribu- 
tions shift toward lower (more negative) values at higher 
redshift. Negative asymmetry values are unphysical and 
typically result from low S/N sources, where the source 
is very close to the background level that is being sub- 
tracted when making the asymmetry measurement. 

Overall, the agreement between our measurements for 
the z > 3 JWST CEERS galaxies and the various sim- 
ulations is encouraging. The differences seen (for ex- 
ample, the difference in axis ratio and the lack of small 
compact galaxies in the simulations) are worthy of a 
more in-depth look in order to determine if there are 
selection effects impacting the results or if there is an 
actual physical difference between galaxies in these sim- 
ulations and those in the real observed universe. 
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6. SUMMARY AND CONCLUSION 


In this work, we have conducted a comprehensive anal- 
ysis of 850 z > 3 galaxies detected in both HST CAN- 
DELS imaging of the EGS field and JWST CEERS NIR- 
Cam imaging. These galaxies were visually classified by 
three people each, their parametric morphologies were 
measured using Galfit and Galapagos-2/GalfitM, and 
their non-parametric morphologies were measured using 
statmorph. Our visual classification scheme contains 
classes that are intentionally not mutually exclusive so 
that we can track the properties of galaxies with differ- 
ent components separately. We compare our results to 
morphology measurements based on the HST imaging 
alone, as well as several cosmological simulations. Our 
results are summarized as follows: 


1. Galaxies detected by both HST and JWST in the 
z > 3 Universe have a wide diversity of morpholo- 
gies. Galaxies that have disks make up a large 
fraction of our sample at all redshifts, from ~ 60% 
at z = 3-4 to ~ 30% at z > 6. Galaxies 
with spheroids make up ~ 40% across the full red- 
shift range, while pure spheroids without a disk 
component or irregular features make up ~ 20%. 
The fraction of galaxies with irregular features is 
roughly constant at all redshifts (~ 40 — 50%), 
while those that purely irregular (with no evidence 
for a disk or spheroidal component) increase from 
~ 12% at z = 3.0 — 3.5 to ~ 20% at z > 4.5. 


2. Significant differences are seen between JWST 
morphologies and the HST morphologies for the 
same galaxies. With only HST imaging, a smaller 
fraction of galaxies at z > 3 have disks, spheroid, 
or irregular features overall due to the larger frac- 
tion, particularly at z > 4.5, that are unresolved 
or unclassifiable. For resolved classifiable galaxies, 
the observed difference in classification is largely 
driven by low-surface brightness disks being too 
faint to capture in the HST imaging. 


3. The distributions of Sérsic index, size, and axis 
ratios show significant differences between the dif- 
ferent morphological groups, as expected. The 
spheroid population has a broad distribution of 
Sérsic index, and therefore, Sérsic index cannot 
be used to cleanly separate disk-dominated from 
Spheroid-dominated galaxies, as has been shown 
previously based on HST imaging. Galaxies with 
a spheroid tend to be smaller, on average, than 
galaxies with disks or irregular features. 


4. The distribution of axis ratios for the Spheroid 
Only galaxies peaks at high values and is consis- 


tent with a triaxial population. The Disk Only, 
Irregular Only, and Disk+Irregular galaxies peak 
at lower values with an overall broad distribution, 
while the Disk+Spheroid and Spheroid+Irregular 
groups are intermediate. In general, smaller galax- 
ies tend to be rounder. 


5. While classical classification boundaries using non- 
parametric measures such as concentration, asym- 
metry, Gini, and Mg) do not cleanly separate 
galaxies by their morphological type, galaxies with 
a spheroid have a higher concentration, on av- 
erage, than disks and irregulars, while irregular 
galaxies have a higher mean asymmetry value. Ir- 
regular galaxies also have higher Gini and M2 val- 
ues on average and are slightly more likely than 
disks or spheroids to lie above the merger selec- 
tion line. 


6. The distribution of Sérsic index, size, axis ratio, 
and asymmetry of the z > 3 sample is overall well- 
matched by the distributions from the CEERS 
mock catalog derived from the Santa Cruz Semi- 
analytic model, and measurements from mock im- 
ages based on Illustris TNG50 and IlustrisTNG100 
galaxies. The simulations do not have the small 
compact galaxies that we observe in CEERS. The 
axis ratio distribution for TNG50 and TNG100 
galaxies peaks at higher b/a and drops off more 
sharply at lower values than the CEERS galaxies. 


Overall, these trends suggest that galaxies with estab- 
lished disks and spheroidal morphologies exist across the 
full redshift range of this study. Future work with larger 
samples that capture many more galaxies at the high 
redshift end in conjunction with observations that can 
probe their dynamical nature are needed to fully explore 
the parameter space, understand how these disks and 
spheroids compare to today's, and quantify the emer- 
gence of the first disks and spheroids. 
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APPENDIX 


A. ADDITIONAL DETAILS 


Here we include some details and additional figures 
for the visual classifications and Galfit measurements. 
Figure 13 highlights the level of agreement among the 
three classifiers for the three options in the main mor- 
phological class: disk, spheroid, and irregular. For the 
visual classifications, an example set of stamps that was 
shown to the classifiers for one of the galaxies is shown 
in Figure 14. 


A1. Galfit Fits 


We used Galfit to compute parametric fits on the 
F150W and F200W images. As initial guesses, we use 
the source location, magnitude, size, position angle, and 
axis ratios from the Source Extractor catalogs and 
segmentation maps. We use cutouts of the error ar- 
ray (ERR extension) produced by the JWST pipeline as 
the input sigma images for each source, which includes 
Poisson noise from the sources themselves, as well as the 
usual instrument noise. As input PSFs, we create em- 
pirical PSFs for each filter from stacked stars from all 
four CEERS pointings. For each galaxy, the Kron radius 
measured by Source Extractor was used to scale the 
size of the cutout used as input to Galfit. All galaxies 
in the cutout within three magnitudes of the primary 
source were simultaneously fit, down to a magnitude 
limit of 27, with all other sources masked. Based on our 
testing, we find that sources fainter than this limit were 
not reliably fit. We assign each fit a quality flag. A flag 
of 0 indicates a good fit; a flag of 1 indicates that the 
fit is suspect, meaning the resulting Galfit magnitude 
differs substantially from the input Source Extractor 
magnitudes; a flag of 2 indicates a poor fit, where one or 
more parameters reached a constraint limit; a flag of 3 
indicates thay the fit failed to find a solution; and a flag 
of 4 indicates that the source was not fit either because 
it was either an artifact or located too close to the edge. 

Of the 850 galaxies in the z > 3 sample, 74% have a 
Galfit flag of 0, 13% have a flag of 1, 8% have a flag of 
2, 2% have a flag of 3, and none have a flag of 4 (these 
would already have been removed by our initial sample 
selection). For the comparisons discussed below, we use 
all of the galaxies with a flag of 0 or 1, representing 87% 
of the total sample. 

As a check on the quality of our fits, and for con- 
sistency, we compared the Galfit and GalfitM fits for 
the F200W filter and visually inspected the model and 
residuals. Overall, we see a high level of agreement with 
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Figure 13. A matrix highlighting the level of agreement 
between the three classifiers for all galaxies with a disk, with 
a spheroid, or those with irregular features. Overall, clas- 
sifiers regularly agree when a galaxy has a disk, agree less 
often about galaxies with a spheroid, and are more likely to 
disagree about irregular features. 


no significant offsets for the Sérsic index, the size, the 
axis ratio, and the magnitudes between the two mea- 
surements for the sources that do not reach a constraint 
limit. Throughout the paper, we use the GalfitM mea- 
surements for the filter closest to the rest-frame optical 
at the redshift of the galaxy. The median and 16th per- 
centile and 84th percentile range for the Sérsic index 
(n), the effective radius (Re), and the axis ratio (b/a) 
for each morphological type are given in Table 1 and are 
used throughout the text. 
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Figure 14. An example set of postage stamps for one galaxy (CANDELS ID 16438 with z = 3.8 and log(M,/Mo)= 10.0), 
used for the visual classification of galaxies at z > 3. The stamps are scaled by the size of the galaxy as measured by Source 
Extractor, following Equations 2 and 3 of Haussler et al. (2007), with a minimum size of 100x100 pixels. Two of the three 
classifiers classified this galaxy as having both a disk and a spheroid, while the third classified it as having a disk only. 
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Table 1. Median Properties for Each Morphology Class 


Morphology Group # Redshift log(Mass [Mo]) n? Re [kpc]? bJa? 
All Disks 467 3.84+0.87 9.31+0.63 Lits. 1.194975 0.401927 
Disks Only 192 3.8640.90 9.2840.47 1.161055 1.19+9$1 0.401921 
Disk+Spheroids 88 3.86+0.98 9.2941.12 2.381426  0.83*0 73 0.514917 
Disk--Irregulars 155 3.82+0.82 9.32+0.41 Dit 0387055. 0:96 0034 
Disk+Spheroid+Irregulars 32 3.8140.60 9.4940.48 — 2.63*193  1.33*137 0,3910:18 
All Spheroids 323 3.9441.04 9.3040.71 2.482122 0.72:998  0.56*022 
Spheroid Only 156 4.0341.11 9.2940.47 . 2.467722? 0.541952 0.647017 
Spheroid--Irregulars 47 3.90+1.12 9.2140.39 2.927230  0.72*051 0.557018 
All Irregulars 376 3.9040.93 9.29+0.43 Lot 028 057. > 10405545 
Irregular Only 142 4.02+1.02 9.24+0.44 110 "492 042 
Point Source / Unresolved 16 5.27+1.82 9.663-0.65 e : 
Unclassifiable 18 4.57+1.54 9.47+0.84 


“Sersic index (n), effective radius (Re in kpc), and axis ratio (b/a) as measured in the NIRCam 
filter that most closely represents the rest-frame optical for the redshift of the galaxy: F277W for 
3.0 < z < 4.0, F356W for 4.0 < z < 4.5, and F444W for z > 4.5. 


